The Reversal Curse: LLMs trained on "A is B" fail to learn "B is A"

Abstract

We expose a surprising failure of generalization in auto-regressive largelanguage models (LLMs). If a model is trained on a sentence of the form "A isB", it will not automatically generalize to the reverse direction "B is A".This is the Reversal Curse. For instance, if a model is trained on "Olaf Scholzwas the ninth Chancellor of Germany", it will not automatically be able toanswer the question, "Who was the ninth Chancellor of Germany?". Moreover, thelikelihood of the correct answer ("Olaf Scholz") will not be higher than for arandom name. Thus, models exhibit a basic failure of logical deduction and donot generalize a prevalent pattern in their training set (i.e. if "A is B''occurs, "B is A" is more likely to occur). We provide evidence for the ReversalCurse by finetuning GPT-3 and Llama-1 on fictitious statements such as "UriahHawthorne is the composer of 'Abyssal Melodies'" and showing that they fail tocorrectly answer "Who composed 'Abyssal Melodies?'". The Reversal Curse isrobust across model sizes and model families and is not alleviated by dataaugmentation. We also evaluate ChatGPT (GPT-3.5 and GPT-4) on questions aboutreal-world celebrities, such as "Who is Tom Cruise's mother? [A: Mary LeePfeiffer]" and the reverse "Who is Mary Lee Pfeiffer's son?". GPT-4 correctlyanswers questions like the former 79% of the time, compared to 33% for thelatter. This shows a failure of logical deduction that we hypothesize is causedby the Reversal Curse. Code is available athttps://github.com/lukasberglund/reversal_curse.

Quick Read (beta)

loading the full paper ...